{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Conjugate Gradient: the Krylov subspace method for Convex Quadratic Optimization $\\def\\b#1{\\boldsymbol{ #1}}\\def\\B#1{\\boldsymbol{ #1}}$\n",
        "\n",
        "Conjugate gradient can be thought of as a Krylov subspace method for the quadratic optimization problem $\\min(f(\\b x)$ where\n",
        "$$f(\\b x)= \\frac{1}{2}\\b x^T\\b A \\b x + \\b c^T \\b x$$\n",
        "and $\\b A$ is symmetric positive definite (this makes the problem convex).\n",
        "\n",
        "In this demo, we compare the result of conjugate gradient to an explicitly constructed Krylov subspace.\n",
        "\n",
        "We start by picking a random $\\b A$ and $\\b c$:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {},
      "outputs": [],
      "source": [
        "import numpy as np\n",
        "import numpy.linalg as la\n",
        "import scipy.optimize as sopt\n",
        "\n",
        "n = 32\n",
        "# make A a random SPD matrix\n",
        "Q = la.qr(np.random.randn(n, n))[0]\n",
        "A = Q @ (np.diag(np.random.rand(n)) @ Q.T)\n",
        "\n",
        "c = np.random.randn(n)\n",
        "\n",
        "# pick number of steps of CG to execute\n",
        "k = 10\n",
        "\n",
        "# define quadratic objective function\n",
        "def f(x):\n",
        "    return .5*np.inner(x, A @ x) + np.inner(c,x)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Quadratic Programming using Arnoldi\n",
        "\n",
        "First, we form a Krylov subspace using Arnoldi iteration on $\\b A$ with starting guess $\\b c$, then select the minima via\n",
        "$$\\B x_k = -||\\B c||_2\\B Q_k \\B T_k^{-1} \\B e_1 ,$$\n",
        "which is a minimizes within this Krylov subspace since\n",
        "\\begin{align*}\n",
        "\\min_{\\B x \\in \\mathcal{K}_k(\\B A, \\B c)} f(\\B x) &= \\min_{\\B y \\in \\mathbb{R}^k} f(\\B Q_k \\B y)  \\\\\n",
        "&=\\min_{\\B y \\in \\mathbb{R}^k} \\B y^T \\B Q_k \\B A \\B Q_k \\B y + \\B c^T \\B Q_k \\B y \\\\\n",
        "&= \\min_{\\B y \\in \\mathbb{R}^k} \\B y^T \\B T_k \\B y + ||\\B c||_2\\B e_1^T \\B y.\n",
        "\\end{align*}\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "-18.581620048126233 = min_{ x_kr in Krylov subspace of dimension  1 } f(x_kr)\n",
            "-22.5369706913421 = min_{ x_kr in Krylov subspace of dimension  2 } f(x_kr)\n",
            "-26.211888363793932 = min_{ x_kr in Krylov subspace of dimension  3 } f(x_kr)\n",
            "-27.259320789625654 = min_{ x_kr in Krylov subspace of dimension  4 } f(x_kr)\n",
            "-27.787499243490494 = min_{ x_kr in Krylov subspace of dimension  5 } f(x_kr)\n",
            "-27.94014696474066 = min_{ x_kr in Krylov subspace of dimension  6 } f(x_kr)\n",
            "-28.013395960580716 = min_{ x_kr in Krylov subspace of dimension  7 } f(x_kr)\n",
            "-28.070399742437132 = min_{ x_kr in Krylov subspace of dimension  8 } f(x_kr)\n",
            "-28.078784618436288 = min_{ x_kr in Krylov subspace of dimension  9 } f(x_kr)\n",
            "-28.084401131355364 = min_{ x_kr in Krylov subspace of dimension  10 } f(x_kr)\n"
          ]
        }
      ],
      "source": [
        "# initialize Krylov subspace, taking first column as c/||c||_2\n",
        "Q = np.zeros((n,k))\n",
        "T = np.zeros((k,k))\n",
        "x_kr = np.zeros((n,k))\n",
        "Q[:,0] = c / la.norm(c)\n",
        "\n",
        "for i in range(1,k+1):\n",
        "    # perform Arnoldi iteration\n",
        "    u = A @ Q[:,i-1]\n",
        "    for j in range(0,i):\n",
        "        T[j,i-1] = np.inner(Q[:,j],u)\n",
        "        u = u - T[j,i-1]*Q[:,j]\n",
        "    if i<k:\n",
        "        T[i,i-1] = la.norm(u)\n",
        "        Q[:,i] = u/T[i,i-1]\n",
        "    # compute and store new best minimizer of f(x)\n",
        "    if i>0:\n",
        "        # compute approximate minima x_kr as -||\\b c||_2\\b Q \\b T^{-1} \\b e_1\n",
        "        e1 = np.zeros(i) \n",
        "        e1[0] = la.norm(c)\n",
        "        x_kr[:,i-1] = - Q[:,:i] @ la.solve(T[:i,:i],e1)\n",
        "        print(f(x_kr[:,i-1]), \"= min_{ x_kr in Krylov subspace of dimension \", i,\"} f(x_kr)\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Nonlinear Programming using Parallel Tangents for Quadratic Objective\n",
        "\n",
        "Next, we implement Conjugate Gradient using the parallel tangents method, which is general to arbitrary nonlinear objective functions (but the convergence and optimality properties are only true for a convex quadratic objective function).\n",
        "\n",
        "The parallel tangent method performs two line minimizations at each iteration:\n",
        "  * Perform a step of steepest descent to generate $\\B{\\hat{x}}_{k}$ from $\\B x_k$.\n",
        "  * Generate $\\B x_{k+1}$ by minimizing over the line passing through $\\B x_{k-1}$ and $\\B{\\hat{x}}_{k}$.\n",
        "\n",
        "To initialize the parallel tangents method, we use the two first approximate minima obtained from the Krylov subspace method above. Subsequent iterates are then reproduced exactly!"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 28,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "-26.211888363793925  = f( conjugate gradient parallel tangents iterate 2 )\n",
            "-27.259320789625654  = f( conjugate gradient parallel tangents iterate 3 )\n",
            "-27.787499243490497  = f( conjugate gradient parallel tangents iterate 4 )\n",
            "-27.940146964740656  = f( conjugate gradient parallel tangents iterate 5 )\n",
            "-28.013395960580713  = f( conjugate gradient parallel tangents iterate 6 )\n",
            "-28.07039974243711  = f( conjugate gradient parallel tangents iterate 7 )\n",
            "-28.07878461843628  = f( conjugate gradient parallel tangents iterate 8 )\n",
            "-28.08440113135535  = f( conjugate gradient parallel tangents iterate 9 )\n"
          ]
        }
      ],
      "source": [
        "# perform conjugate gradient by parallel tangents method\n",
        "x_cg_pp = np.zeros((n, k))\n",
        "x_cg_pp[:,0] = x_kr[:,0]\n",
        "x_cg_pp[:,1] = x_kr[:,1]\n",
        "\n",
        "# perform CG method via parallel tangents method\n",
        "for i in range(2,k):\n",
        "    x = x_cg_pp[:,i-1]\n",
        "\n",
        "    # define function for steepest descent, based on current iterate x and grad_f(x)\n",
        "    # can use local variables within function\n",
        "    def f_1d(a):\n",
        "        return f(x + a*(A@x+c))\n",
        "\n",
        "    # solve for best steepest descent step using golden section search and form x_hat\n",
        "    a = sopt.golden(f_1d)\n",
        "    x_hat = x + a * (A@x+c)\n",
        "\n",
        "    # define function for extrapolation, based on last iterate x_cg_pp[:,i-2] and x_hat (the solution of steepest descent)\n",
        "    # can use local variables within function\n",
        "    def g_1d(a):\n",
        "        return f(x_hat + a*(x_hat - x_cg_pp[:,i-2]))\n",
        "    # solve for best extrapolation using golden section search and update x\n",
        "    a = sopt.golden(g_1d)\n",
        "    x = x_hat + a*(x_hat - x_cg_pp[:,i-2])\n",
        "\n",
        "    # store CG iterate as next column of x_cg_pp\n",
        "    x_cg_pp[:,i] = x\n",
        "    print(f(x), \" = f( conjugate gradient parallel tangents iterate\", i, \")\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Quadratic Programming by Conjugate Gradient\n",
        "\n",
        "The 1D line minimizations in the parallel tangents method can be solved analytically given a quadratic objective, so the parallel tangents implementation can be transformed to use a couple of matrix vector products in place of the two golden section searches."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 30,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "-26.21188836379393  = f( conjugate gradient matrix vector product iterate 2 )\n",
            "-27.259320789625654  = f( conjugate gradient matrix vector product iterate 3 )\n",
            "-27.787499243490494  = f( conjugate gradient matrix vector product iterate 4 )\n",
            "-27.94014696474065  = f( conjugate gradient matrix vector product iterate 5 )\n",
            "-28.01339596058072  = f( conjugate gradient matrix vector product iterate 6 )\n",
            "-28.070399742437118  = f( conjugate gradient matrix vector product iterate 7 )\n",
            "-28.078784618436284  = f( conjugate gradient matrix vector product iterate 8 )\n",
            "-28.08440113135536  = f( conjugate gradient matrix vector product iterate 9 )\n"
          ]
        }
      ],
      "source": [
        "# perform CG method via matrix-vector products\n",
        "x_cg_mv = np.zeros((n, k))\n",
        "x_cg_mv[:,0] = x_kr[:,0]\n",
        "x_cg_mv[:,1] = x_kr[:,1]\n",
        "\n",
        "# this implementation mimics parallel tangents implementation,\n",
        "# one of the matrix vector products can in fact be avoided\n",
        "for i in range(2,k):\n",
        "    x = x_cg_mv[:,i-1]\n",
        "\n",
        "    # compute gradient\n",
        "    g = A@x + c\n",
        "\n",
        "    # compute explicit equation for optimal step size for steepest descent\n",
        "    a = - np.inner(g,g)/np.inner(g, A@g)\n",
        "\n",
        "    # update x\n",
        "    x_hat = x + a * g\n",
        "\n",
        "    # determine new optimization direction\n",
        "    d = x_hat - x_cg_mv[:,i-2]\n",
        "\n",
        "    # compute explicit equation for optimal step size in direction d = x_hat - x_{k-1}\n",
        "    a = - np.inner(g,d)/np.inner(d, A@d)\n",
        "\n",
        "    #update x\n",
        "    x = x_hat + a * d\n",
        "\n",
        "    # store CG iterate as next column of x_cg_mv\n",
        "    x_cg_mv[:,i] = x\n",
        "    print(f(x), \" = f( conjugate gradient matrix vector product iterate\", i, \")\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "It can be shown that each step of Conjugate gradient causes an update $\\b s_k = \\b x_{k+1} - \\b x_k$ that is $\\b A$-conjugate ($\\b A$-orthogonal) to the previous, i.e. $\\b s_i^T\\b A \\b s_{i-1}$:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 34,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "1.3322676295501878e-15\n",
            "-9.159339953157541e-16\n",
            "-9.298117831235686e-16\n",
            "1.4016565685892601e-15\n",
            "-1.4988010832439613e-15\n",
            "5.412337245047638e-16\n",
            "-4.0766001685454967e-16\n",
            "-9.84455572616838e-17\n"
          ]
        }
      ],
      "source": [
        "for i in range(2,k):\n",
        "    print(np.inner(x_cg_mv[:,i]-x_cg_mv[:,i-1],A @ (x_cg_mv[:,i-1]-x_cg_mv[:,i-2])))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "In fact $\\b s_i^T \\b A \\b s_j= 0$ if $i\\neq j$."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 39,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "[[ 7.91070129e+00  1.70653270e-15  2.01168380e-15 -3.36896273e-15\n",
            "  -1.41699059e-15  1.16368235e-15  1.75478151e-15 -2.56111365e-15\n",
            "   1.84025210e-16]\n",
            " [ 1.89155679e-15  7.34983534e+00 -7.92317776e-16  1.74278686e-15\n",
            "  -1.29368184e-15 -4.34070708e-17 -3.13304765e-16  2.11964906e-15\n",
            "  -2.90265624e-15]\n",
            " [ 2.12565734e-15 -1.16154774e-15  2.09486485e+00 -1.00963612e-15\n",
            "   6.34119276e-16  2.92235836e-16  3.25775056e-16 -2.92059891e-15\n",
            "   3.20694701e-16]\n",
            " [-3.37966962e-15  1.41572867e-15 -8.73998938e-16  1.05635691e+00\n",
            "   1.32537319e-15 -1.97282397e-16 -2.07301789e-15  5.77699721e-16\n",
            "   4.13470928e-16]\n",
            " [-1.26903931e-15 -1.51237348e-15  6.49143400e-16  1.25021495e-15\n",
            "   3.05295443e-01 -1.52468324e-15 -6.86809001e-16  1.32754777e-15\n",
            "   5.67663994e-16]\n",
            " [ 1.23201779e-15 -2.40700798e-16  2.30888083e-16 -2.30290266e-16\n",
            "  -1.46376434e-15  1.46497992e-01  4.97248520e-16 -3.04286527e-16\n",
            "   1.11648805e-16]\n",
            " [ 1.77314281e-15 -1.66636514e-16  3.28787680e-16 -2.07658876e-15\n",
            "  -6.31189598e-16  5.32346203e-16  1.14007564e-01 -4.13165340e-16\n",
            "  -2.46934144e-16]\n",
            " [-2.55447650e-15  2.07836958e-15 -2.91339344e-15  5.73393786e-16\n",
            "   1.33805749e-15 -2.97859601e-16 -4.01175437e-16  1.67697520e-02\n",
            "  -9.47010750e-17]\n",
            " [ 1.59594560e-16 -2.87270208e-15  3.53883589e-16  3.98986399e-16\n",
            "   5.58580959e-16  1.11239143e-16 -2.45029691e-16 -9.77950360e-17\n",
            "   1.12330258e-02]]\n"
          ]
        }
      ],
      "source": [
        "print((x_cg_mv[:,1:]-x_cg_mv[:,:-1]).T @ A @ (x_cg_mv[:,1:]-x_cg_mv[:,:-1]))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "This fact permits a further optimization of the conjugate gradient iteration, resulting in only 1 matrx-vector product (the resulting work-efficient method can be found in many textbooks, papers, and on wikipedia)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.5.2"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}